An Empirical Study on the Performance of Integrated Hybrid Prediction Model on the Medical Datasets

نویسندگان

  • Sarojini Balakrishnan
  • Ramaraj Narayanaswamy
  • Ilango Paramasivam
  • David J. Russomanno
چکیده

The medical data are multidimensional and hundreds of independent features in these high dimensional databases need to be considered and analyzed, for valuable decision-making information in medical prediction. Most data mining methods depend on a set of features that define the behavior of the learning algorithm and directly or indirectly influence the complexity of the resulting models. Hence, to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed. Feature selection is a preprocessing step which aims to reduce the dimensionality of the data by selecting the most informative features that influence the diagnosis of the disease. We propose a feature selection embedded Hybrid Prediction model that combines two different functionalities of data mining; the clustering and the classification. The F-score feature selection method and k-means clustering selects the optimal feature subsets of the medical datasets that enhances the performance of the Support Vector Machine classifier. The performance of the SVM classifier is empirically evaluated on the reduced feature subset of Diabetes, Breast Cancer and Heart disease data sets. The proposed model is validated using four parameters namely the Accuracy of the classifier, Area Under ROC Curve, Sensitivity and Specificity. The results prove that the proposed feature selection embedded hybrid prediction model indeed improve the predictive power of the classifier and reduce false positive and false negative rates. The proposed method achieves a predictive accuracy of 98.9427% for diabetes dataset, 99% for cancer dataset and 100% for heart disease dataset, the highest predictive accuracy for these datasets, compared to other models reported in the literature. General Terms Data Mining, Dimensionality Reduction, Feature selection, Prediction Model

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AN EXTENDED FUZZY ARTIFICIAL NEURAL NETWORKS MODEL FOR TIME SERIES FORECASTING

Improving time series forecastingaccuracy is an important yet often difficult task.Both theoretical and empirical findings haveindicated that integration of several models is an effectiveway to improve predictive performance, especiallywhen the models in combination are quite different. In this paper,a model of the hybrid artificial neural networks andfuzzy model is proposed for time series for...

متن کامل

A Novel Intelligent Energy Management Strategy Based on Combination of Multi Methods for a Hybrid Electric Vehicle

Based on the problems caused by today conventional vehicles, much attention has been put on the fuel cell vehicles researches. However, using a fuel cell system is not adequate alone in transportation applications, because the load power profile includes transient that is not compatible with the fuel cell dynamic. To resolve this problem, hybridization of the fuel cell and energy storage device...

متن کامل

Performance evaluation of gang saw using hybrid ANFIS-DE and hybrid ANFIS-PSO algorithms

One of the most significant and effective criteria in the process of cutting dimensional rocks using the gang saw is the maximum energy consumption rate of the machine, and its accurate prediction and estimation can help designers and owners of this industry to achieve an optimal and economic process. In the present research work, it is attempted to study and provide models for predicting the m...

متن کامل

Which Methodology is Better for Combining Linear and Nonlinear Models for Time Series Forecasting?

Both theoretical and empirical findings have suggested that combining different models can be an effective way to improve the predictive performance of each individual model. It is especially occurred when the models in the ensemble are quite different. Hybrid techniques that decompose a time series into its linear and nonlinear components are one of the most important kinds of the hybrid model...

متن کامل

An Improved Hybrid Model with Automated Lag Selection to Forecast Stock Market

Objective: In general, financial time series such as stock indexes have nonlinear, mutable and noisy behavior. Structural and statistical models and machine learning-based models are often unable to accurately predict series with such a behavior. Accordingly, the aim of the present study is to present a new hybrid model using the advantages of the GMDH method and Non-dominated Sorting Genetic A...

متن کامل

An Integrated Model of Project Scheduling and Material Ordering: A Hybrid Simulated Annealing and Genetic Algorithm

This study aims to deal with a more realistic combined problem of project scheduling and material ordering. The goal is to minimize the total material holding and ordering costs by determining the starting time of activities along with material ordering schedules subject to some constraints. The problem is first mathematically modelled. Then a hybrid simulated annealing and genetic algorithm is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011